.NET Regexes: Why Groups and Captures?

Mark Leighton Fisher on 2007-01-26T18:25:50

When you need to grab part of a regex for later use, why does .NET require dealing with both groups and captures? In my experience, the canonical method for dealing with regex captures has you numbering your captures starting at 1 from their opening parenthesis, so that in a regex like:

    ^[^"']*((.)([^"']*)(.))

capture $1 is the whole quoted string, captures $2 and $4 are the quotes, and capture $3 is the quoted string itself.

In the .NET regex classes, you can't directly access the captures – you must go through their containers, the Groups. What would match my own personal model is where each Capture would be linked somehow to the Captures contained within, with each Regex linked to all of its Captures in opening-parenthesis-first order like in the example above.. If there are no Captures, there is no link. ("Link" does not mean literal link – it could just as well be some kind of collection reference.)

I suspect there is a use for Groups, but I'm not sure what it is. Maybe when you are generating your regexes with a program you could use a Group to simplify your code.


Terminology

JonathanWorthington on 2007-01-28T17:09:08

To translate from Perl terminology to .Net, think of a group as a capture. If you just want the value that you would get in $1, $2 etc then access group 1, 2, etc and use the .Value property (I think).

Captures are something else in .Net - they are useful if you have a quantified group. So if you have

([ab])*

And matched the string aabba then your group object would just have the value "a", but there would be 5 capture objects held in the collection within the group object, having the values "a", "a", "b", "b" and "a" respectively.

At least, that's what I remember of it. I'm a little hazy. :-)